Scalable model-based cluster analysis using clustering features
نویسندگان
چکیده
منابع مشابه
Scalable model-based cluster analysis using clustering features
We present two scalable model-based clustering systems based on a Gaussian mixture model with independent attributes within clusters. They first summarize data into sub-clusters, and then generate Gaussian mixtures from their clustering features using a new algorithm — EMACF. EMACF approximates the aggregate behavior of each sub-cluster of data items in the Gaussian mixture model. It provably c...
متن کاملScalable, Balanced Model-based Clustering
This paper presents a general framework for adapting any generative (model-based) clustering algorithm to provide balanced solutions, i.e., clusters of comparable sizes. Partitional, model-based clustering algorithms are viewed as an iterative two-step optimization process—iterative model re-estimation and sample re-assignment. Instead of a maximum-likelihood (ML) assignment, a balanceconstrain...
متن کاملOn Model-Based Clustering, Classification, and Discriminant Analysis
The use of mixture models for clustering and classification has burgeoned into an important subfield of multivariate analysis. These approaches have been around for a half-century or so, with significant activity in the area over the past decade. The primary focus of this paper is to review work in model-based clustering, classification, and discriminant analysis, with particular attenti...
متن کاملScalable Clustering using MapReduce Programming Model
The aim is to implement a clustering algorithm, which will run in a distributed computing environment for which, a multi-node Hadoop cluster providing support for the Hadoop Distributed File System and the MapReduce Programming Model has been set up. In this paper, Exclusive and Complete Clustering (ExCC), a grid based algorithm, is implemented by scheduling consecutive MapReduce Jobs, for mass...
متن کاملMutantX-S: Scalable Malware Clustering Based on Static Features
The current lack of automatic and speedy labeling of a large number (thousands) of malware samples seen everyday delays the distribution of malware signatures, leading to a low detection rate of new malware samples in the wild. In this paper, we design, implement and evaluate a novel, scalable framework, called MutantX-S, that can efficiently cluster a large number of samples into families base...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Pattern Recognition
سال: 2005
ISSN: 0031-3203
DOI: 10.1016/j.patcog.2004.07.012